Conditional Information Bottleneck Clustering
نویسندگان
چکیده
We present an extension of the well-known information bottleneck framework, called conditional information bottleneck, which takes negative relevance information into account by maximizing a conditional mutual information score. This general approach can be utilized in a data mining context to extract relevant information that is at the same time novel relative to known properties or structures of the data. We present possible applications of the conditional information bottleneck in information retrieval and text mining for recovering non-redundant clustering solutions, including experimental results on the WebKB data set which validate the approach.
منابع مشابه
Interpreting Classifiers by Multiple Views
Next to prediction accuracy, interpretability is one of the fundamental performance criteria for machine learning. While high accuracy learners have intensively been explored, interpretability still poses a difficult problem. To combine accuracy and interpretability, this paper introduces an framework which combines an approximative model with a severely restricted number of features with a mor...
متن کاملThe information bottleneck and geometric clustering
The information bottleneck (IB) approach to clustering takes a joint distribution P (X,Y ) and maps the data X to cluster labels T which retain maximal information about Y (Tishby et al., 1999). This objective results in an algorithm that clusters data points based upon the similarity of their conditional distributions P (Y | X). This is in contrast to classic “geometric clustering” algorithms ...
متن کاملInformation Bottleneck Co-clustering
Co-clustering has emerged as an important approach for mining contingency data matrices. We present a novel approach to co-clustering based on the Information Bottleneck principle, called Information Bottleneck Co-clustering (IBCC), which supports both soft-partition and hardpartition co-clusterings, and leverages an annealing-style strategy to bypass local optima. Existing co-clustering method...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملAn Analysis of Model-based Clustering, Competitive Learning, and Information Bottleneck
This paper provides a general formulation of probabilistic model-based clustering with deterministic annealing (DA), which leads to a unifying analysis of k-means, EM clustering, soft competitive learning algorithms (e.g., self-organizing map), and information bottleneck. The analysis points out an interesting yet not well-recognized connection between the k-means and EM clustering—they are jus...
متن کامل